Random Networks Tossing Biased Coins 
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In statistical mechanical investigations on complex networks, it is useful to employ random graphs 
ensembles as null models, to compare with experimental realizations. Motivated by transcription 
networks, we present here a simple way to generate an ensemble of random directed graphs with, 
asymptotically, scale-free outdegree and compact indegree. Entries in each row of the adjacency 
matrix are set to be zero or one according to the toss of a biased coin, with a chosen probability 
distribution for the biases. This defines a quick and simple algorithm, which yields good results 
already for graphs of size n ~ 100. Perhaps more importantly, many of the relevant observables 
are accessible analytically, improving upon previous estimates for similar graphs. The technique is 
easily generalizable to different kinds of graphs. 



I. INTRODUCTION. 

In our investigation concerning transcription networks, 
we came across a simple and effective way to generate a 
random ensemble of directed graphs having similar fea- 
tures as the experimental ones. Transcription networks 
are directed graphs that represent regulatory interactions 
between genes. Specifically, the link a — > b exists if the 
protein coded by gene a affects the transcription of gene 
b in mRNA form by binding along DNA in a site up- 
stream of its coding region [1]. For a few organisms, such 
as E. coli and S. cerevisiae, a significant fraction of the 
wiring diagram of this network is known [2-5] . The hope 
is to study these graphs, together with the available infor- 
mation on the genes and the physics/chemistry of their 
interactions, to infer information on the large-scale ar- 
chitecture and evolution of gene regulation in living sys- 
tems. In this program, the simplest approach to take is 
to study the topology of the interaction networks. For 
instance, order parameters such as the connectivity and 
the clustering coefficient have been considered [3] . 

To assess a topological feature of a network, one typ- 
ically generates so called "randomized counterparts" of 
the original data set as a null model. That is, an ensem- 
ble of random networks which bare some resemblance to 
the original. The idea behind it is to establish when 
and to what extent the observed biological topology, and 
thus loosely the living system under exam, deviates from 
the "typical case" statistics of the null ensemble. For 
example, a topological feature that has lead to relevant 
biological findings, in particular for transcription, is the 
occurrence of small subgraphs - or "network motifs" [6- 
10]. The choice of what feature to conserve (or not) in the 
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FIG. 1: Example of a graph generated with our algorithm 
with n — 40, /3 = 2.8, a = 1. Nodes with more than ten 
outgoing edges are larger (red online). 



randomized counterpart is quite delicate and depends on 
specific considerations on the system [11]. The null en- 
semble used to discover motifs usually conserves the de- 
gree sequences of the original network, that is, the num- 
ber of regulators and targets of each node. The observed 
degree sequences for the known transcription networks 
follow a scale-free distribution for the outdegree, with 
exponent between one and two, while being Poissonian 
in the indegree [3, 14]. Motifs are then interpreted as ele- 
mentary circuit- like building blocks and have been shown 
in many cases to work independently [15]. In connec- 
tion with this line of research, it is interesting to study 
random ensembles of graphs with probability distribu- 
tions for the degree sequences that are similar to those 
observed experimentally, with the objective of character- 
izing theoretically some relevant topological observables, 
such as the subgraph distributions [11, 16]. Here, we 
describe a simple, and fast, algorithm that performs this 
task by tossing coins with prescribed random biases. Dif- 



ferently from more sophisticated techniques available in 
the literature [11, 17-20], our method is not designed 
to conserve a degree sequence, but rather as a general 
random graph model, that, in particular, can be used 
to generate graphs with degree distributions that agree 
with the observed power-law distributed outdegrees and 
compact indegrees [11]. To this aim, the ensemble will be 
generated by a parametric model, where the adjustable 
parameters can be used for fits of real data-sets. Note 
that, with the weaker constraint on the degree distribu- 
tion that we have chosen, it would be very inconvenient 
to generate the ensemble throwing degree sequences a 
priori from the given distributions and then using an 
algorithm designed for fixed degree sequences, which is 
necessarily more costly. We will see that, because of the 
extreme simplicity of our formulation, some observables 
can be computed analytically rather than estimated as 
in ref. [11]. After introducing the algorithm and showing 
that the ensemble has the required features, we will com- 
pute the number of some observables that are relevant 
for transcription, such as triangular subgraphs. 



II. ALGORITHM. 



Any directed graph G n with n nodes is completely de- 



scribed by its adjacency matrix A(G n ) 



»i 



where xf 1 - = 1 if there is a directed edge i —* j, other- 
wise. Instead of square matrices, one may also consider 
rectangular matrices with a prescription on the scaling 
of the rows with the columns. In what follows we will 
deal with rectangular matrices m n x n with m n < n. As 
we will see, this is particularly useful for networks with 
power-law degree distributions having exponent equal or 
lower than two (for which the average diverges), to keep 
the asymptotics well-behaved. In the context of tran- 
scription networks, the hypothesis of rectangularity is 
well-motivated by the fact that typically only a subset of 
m n nodes encode for transcription factors (namely, they 
have outgoing edges). Thus, in a m n x n matrix, the 
first m n columns will contain the incoming links to the 
transcription factors, and the next n — m n columns will 
correspond to non transcription-factor encoding genes. 
Note that in general nodes that send out edges are also 
receiving edges (Fig 2). 

Our model ensemble can be defined by the following 
generative algorithm. For each row of A, (i) throw a bias 
9 from a prescribed probability distribution 7r„ (ii) set the 
row elements of A to be or 1 according to the toss of a 
coin with bias 9. Since each row is thrown independently, 
the resulting probability law is 



P{x 



(n) 

hi 



n™"i Jinn e 



•Jl <■ - 



1, 



,m n ,j = l,...,n} 



'(l-^) r 



-£™ = i ■ 



(1) 



1 TT n {d9 i ) 



where e^j £ {0,1}, i = 1, . . . ,m n , j = l,...n. Note 
that the row elements are not independent [21]. Eq. (1) 
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FIG. 2: Example of a rectangular matrix and its associated 
graph. Nodes 1 and 2 represent transcription factors, and can 
regulate any other node. Nodes 3 to 5 are targets and only 
receive incoming links. In this case m„ = 2/5n. 



is a general probability distribution based on two sym- 
metries: (a) the fact that a node regulates other ones is 
independent from the nodes regulated by other genes (b) 
the identity of the regulated nodes is unimportant, and 
what matters is their number only. The two symmetries 
can be summarized by saying that the indegree and the 
outdegree are uncorrelated [22, 23]. It is worth notic- 
ing that our model could also be seen as a special case 
of a directed graph variant of the so called hidden vari- 
ables models, introduced in [24], see also [25,26]. In this 
very general class of undirected random graphs the quan- 
tity 9 is interpreted as the "fitness" of each vertex and 
the emphasis is on the problem of how power-laws may 
emerge "naturally" in interaction networks. To complete 
our model, one has to specify the choice for n n , which de- 
termines the behavior of the graph ensemble. We choose 
the two-parameter distribution 



n n (d9) = Z- 1 9-\^ A] {9)d9 



(2) 



where a > and /3 > 1 are free parameters, X(-,i] 1S the 
characteristic function of the interval (— ,1], taking the 
value one inside the interval and zero everywhere else, 
and Z :— 1-. - is the normalization constant. In 
simple words, Eq. (2) defines the probability to take a 
coin with a certain bias 0, which is connected to the 
outdegree of the corresponding node. As we will see, 
the function 9~^ gives a power-law tail to the outdegree. 
Conversely, the cutoff on 9 defined by a poses a con- 
straint on the number of low outdegree nodes, and will 
be used to control the indegree distribution. In concrete 
applications at finite sizes, it might be useful to introduce 
also an upper cutoff on 7r„ , that is 



n n (d9) ex Z~ 1 9~ f3 X (^,i-^](0)d9. 



(3) 



This does not affect the asymptotic results given below 
but gives more flexibility to the model. Hence, in what 
follows, with the exception of Section V, we shall take 

7 = 0. 



III. RESULTS. 

An example of a graph generated with our algorithm 
is shown in Fig 1. The algorithm is quite efficient: its 
computational cost is determined by the number of coin 
tosses (each of which takes the same amount of opera- 
tions) and thus scales like n 2 . Our Fortran 77 imple- 
mentation running under Linux on an AMD Athlon 64 
X2 3800+ PC, generates a graph with n — 10 4 in about 
3.5 seconds. Many observables can be computed know- 
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ing the probability of the link 



j, n n := P{: 
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1} = J, Q ^ 6-K n {d9). By simple calculation from Eq. (1) 
and (2), we get 
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Note that the formulas above for > 2 and < 2 
are identical, but have been recast to show the leading 
terms in the scaling. Hence \x n , for n — > oo, scales as 
l/n 13 - 1 if 1 < < 2, as (log n)/n if /3 = 2, and as 1/n if 
> 2. Note that /x„ is directly related to the mean num- 
ber of links in the network, which can thus be controlled 
through the parametric dependency of this quantity. We 
did not prove anything regarding the emergence of a gi- 
ant component. The graphs we generated numerically 
seem to have a large component. On the other hand, 
analytically, it is not hard to see that probability that 
a graph G n generated with our technique has only one 
connected component goes to zero as n diverges. 



A. Degree Distributions. 



The variable Z„ 



» 



:= J2i=i x i j represents the in- 
degree of the j-th node in the random graph, while 
S n ,i '■— Y^j=i x ij represents the outdegree of the i- 
th node (1 < i < m n ). Clearly, the mean degrees 
are equal to m n fj, n and nfi n , respectively. To access 
the degree distributions, one has to compute P{S n j = 
fc } = (fc) /[o i] 6k ^ ~ 9) n ~ k *n(dd) and P{Z mntj = k} = 

Let us concentrate first on the outdegree. An evalu- 
ation of its distribution yields the following asymptotic 
law for n — > oo for any a > and /3 > 1: 



P{Sn,j = k} ~ p a ,p(k) = 



v/ 3 " 1 



(13-1) 



k\ 



t k - e- l dt. 



It is easy to show that p a ,i3(k) has a power-law tail. 
Indeed, if k > 0, Pa ,f}(k) = a^ifl - l)(E^g=p - 
r / fc 1 +1) Jq t k ~Pe~ l dt) (where T indicates the gamma func- 



FIG. 3: Degree distributions (in logarithmic scale) of two 
graphs generated with our algorithm. The two panels corre- 
spond to graphs having n = 400, square adjacency matrices 
and different values of the parameters. Top: /3 = 2.8, a — 1. 
Bottom: j3 = 1.8, a = 0.2. To obtain a compact indegree 
distribution in the case of /3 < 2 one has to supply smaller 
values of a. The dashed lines are power-laws with exponent 
13. 



tion). Thus, since rffc+n 



pj-, one concludes that 



Pa ) /9(*) = ^(a / »- 1 C8-l) + o(l)) • 

Fig. 3 shows the degree distributions of numerically gen- 
erated examples for n = 400. In practice, already at 
ii ~ 100 one gets a very marked power-law in the tail 
of the outdegree distribution. Considering now the inde- 
gree, since its behavior is determined by /i„, one has to 
distinguish among the different possible scalings for this 
quantity. The simplest case is j3 > 2, where for m n =]5n[ 
(5 being any constant in (0, 1] and ]x[ being the integer 
part of x) and for n — > oo, using the Poissonian approxi- 
mation of a binomial distribution, it is immediate to show 
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are slightly more complicated for < 2. Here, essentially 
because of the scaling for fi n in the limit n — > oo, the 
indegree distribution diverges if one chooses m n =]Sn[. 
Thus, to obtain a well-behaved asymptotic distribution, 
one has to compensate more strongly for the scaling of \x n 
with the number of rows of A. For = 2, the necessary 
choice is m n =]Sn/ logn[ rows, and for 1 < < 2 one has 
to take m n =]Sn^~ 1 [ rows. With these prescriptions, the 
indegree distribution is asymptotically Poisson, and has 
the form ^^ with A = 6a, or A = Sa^ 1 ^, for = 2 
and 1 < < 2 respectively. In other words, asking for 
a degree distribution that brings to an outdegree having 



a power-law tail with divergent mean (/3 < 2) poses a 
heavy constraint on the number of regulator nodes (the 
rows of the matrix). On the other hand, for the purpose 
of generating square (n x n) matrices at finite n with 
[3 < 2 and compact indegree, this issue is not so impor- 
tant. A suitable choice of the parameter a (see Fig. 3) 
can solve the problem. In what follows we will discuss 
mainly the case of square matrices. 



B. Subgraphs. 

The simple structure of the random graphs generated 
by our algorithm makes it possible to compute analyt- 
ically the mean value of the number of subgraphs of a 
given shape contained in the graph. Consider a sub- 
graph H, with k nodes and m edges, that is, the set of or- 
dered pairs of nodes H = {«i — > ii.i, . . . ,i\ — > ii, mi , 12 — » 



Ik,: 



.}. where ^ 



rrii 



*2,1) ...,lk—* U.li •••?«* 

For example, i\ — > 22,22 — » 23,23 — > *i denotes a "feed- 
back loop" (fbl), or a 3-cycle. Now, if G n is a random 
graph with n nodes generated by our algorithm, the prob- 
ability to observe H as a subgraph of G n can be written 
as 



P{HeG n } = I ffpK(d0i)... [ 

J [0,1] J [0,1] 



r^Ti n (de k ). 



To compute the mean of the number Mh{G u ) of sub- 
graphs isomorphic to H one also has to consider the quan- 
tity N(H) of subgraphs isomorphic to H contained in the 
complete graph with n nodes. The desired average is then 
< A/"h((7„) >= N{H)P{H E G n } (where < .. > denotes 
the mean) . Things are slightly more complicated for rect- 
angular matrices because in the evaluation of N(H) one 
needs to take into consideration also the constrains given 
by the fact that only m n nodes can have out edges. 

As an example, we evaluate now, in the case of square 
matrices, the mean number of feedback loops versus feed- 
forward loops, which play an important role for tran- 
scription [15]. A feedforward loop (ffl) is a triangle 
with the form i\ — > 22 — > 13, i\ — » 13. It is found to 
be a motif in known transcription networks, and iden- 
tified with the function of persistence filter or amplifier. 
Conversely, feedback loops (which in principle could form 
switches and oscillators) are usually not found in tran- 
scription networks [7, 27]. Following the procedure de- 
scribed above, one gets < A/fbi(G„) >= 2(™)/i^ (this 
holds also for fc-cycles, with k in place of 3). Once more, 
this can be evaluated analytically with straightforward 
calculations. As it depends only on the behavior of //„, 
its scaling for large n easily follows. The evaluation of 
feedforward loops is slightly more complicated. In gen- 
eral 
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FIG. 4: Comparison between analytical (dotted lines) and nu- 
merical (triangles) evaluations of the mean number of some 
observables as a function of system size n, for /3 = 2.8, a — 1. 
Numerical averages are evaluated on 10 5 realizations. Top 
and middle: mean number of three-node subgraphs. Each 
subgraph is sketched next to its corresponding plot. Top: 
feedforward and feedback loops (ffl and fbl). Middle: two 
kinds of open triangles, that can be termed "single input mod- 
ules" (sim) and "three-gene chains" (tgc). Bottom: roots and 
leaves. 



and hence, under (2), 

<AA« 1 (G„)> = 6(g) [(n/ g;^_ 1]a x 

Joi/n Joi/n 

Note that the finite n formulas above can be computed 
explicitly, and so does their scaling for finite sizes. In 
appendix A, we spelled out the example of f f Is to ex- 
emplify this point. 

In Fig. 4, we report a comparison of the exact cal- 
culation of some triangular subgraphs with results ob- 
tained from numerical evaluation. The agreement be- 
tween the analytical expressions and the numerics is per- 
fect. Having analytically exact expressions for any sys- 
tem size can be an advantage with respect to models 
where only asymptotically exact expressions are avail- 
able, especially thinking that many concrete datasets 
have relatively small sizes. Moreover, it is possible to 
compute analytically the standard deviation of the num- 
ber of subgraphs. For example, we considered again the 
number of feedback loops and feedforward loops. The 
most interesting fact is that for (3 > 2, the former are al- 
ways more widely distributed. A sketch of the calculation 
and some results are reported in Appendix B. 



Finally, one can evaluate the scaling behavior of the 
ratio of feedback and feedforward loops, which is given 
below 



(A/" f fi(G n )} 



n p-l 


if 1< (3 < 2 


n/(logn) 2 


if (3 = 2 


n 3-P 


if 2 < (3 < 3 


logn 


if (3 = 3 


A 


if /3>3 



<Mbi(G„)} 



where A = 3(/3 - 2) 2 (/3 - 3) _1 (/3 - l)" 1 > 1. Thus, the 
f f 1 always dominate, although there is a wide range of 
regimes. Note that the dominance of feedforward tri- 
angles is even stronger if one considers the rectangu- 
lar adjacency matrices discussed above. For example, 
for 1 < /3 < 2, and rectangular matrices, we calculate 

<Mu(G„)> ~ U - 



C. Roots and Leaves. 

As a second example, we report the calculation of the 
mean number of roots (nodes with only outgoing links) 
versus leaves (nodes with only incoming links). More 
specifically, we say that i is a root if there is no edge of 
the kind j — ► i, but there is at least one edge of the kind 
i —* j, with j 76 %. Loops do not count. Conversely, we 
say that i is a leaf if there is no edge of the kind % — ► j, 
but there is at least one edge of the kind j — > i, with 
j ^ i. Again we exclude loops and isolated points. We 
find the following scaling for the numbers of roots 91, and 
of leaves £: 



<£(G„)>~n 



while 



< 9t(G n ) > 




if (3>2 
if (3 = 2 

if 1< (3 < 2 



where r 2 = ^zk®? 1 '• Once again, we stress that these 
quantities are accessible analytically, and there is perfect 
agreement between the data generated by the algorithm 
and the calculations. 



D. Hub. 

As a last example of important observable in our graph 
ensemble, we discuss the distribution and mean num- 
ber of hubs. The so-called hub is the node having 
maximal outdegrec among the nodes, that is, H n := 
maxj = i ] ... ]Trari (S'n.i). Once again, it is possible to give an 
analytical expression of the limit law of the hub under 
a suitable rescaling. Indeed, by stochastic independence, 
it is clear that P{H n < xb n } = (1 - P{S n ^ > xb n }) m ™, 
where x > is any positive number. Moreover, it is not 
too hard to prove that, for suitable choices of b n and m n , 



P{S n ,i > xb n } = l/rrinKa/x) 13 1 + o(l)]. More precisely, 
for (3 > 2 and for every positive number x 



P{H n /b n < x} 



-(a/xf 



The above expression gives the effective probability dis- 
tribution that one can use for the hub outdegree in the 
asymptotic limit. In particular, for (3 > 2, m n = n 
and b n = n 1 ''^" 1 ', and, with some work, we prove that 
< H n >~ n 1 /^- 1 ), as found in [11]. For (3 = 2, one 
has to take m n = b n = n/ logn, which lead to analogous 
scaling results. Finally, for 1 < (3 < 2 and m r , 
one gets the expression 



= n fi-i 



P{H n /n <x}~ e-^ a '^~\ {0tl) {x) + X[i,oo)(*) 



Note that in this last case the prob- 
hub of size n is asymptotically fi- 



for every positive x. 
ability of finding a 

nite, and equal to 1 — e~^ f ' '. This concentration ef- 
fect was already noted in [11] using a different random 
graph model, without computing explicitly the asymp- 
totic probability distribution. It is worth recalling that 
e -(a/x) I[ 0;+oo ) (2:) is the Frechet type II extreme value 
distribution, i.e. one of the three kinds of extreme value 
distributions that can arise from limit law of maximum of 
independent and identically distributed random variables 
(see for instance [12]). For extreme values distributions 
in scale-free network models see, e.g., [13]. 



IV. OTHER APPLICATIONS. 

While here we restricted our attention to the case of 
directed graphs with compact indegree and power-law 
outdegree, our coin-toss method of generating exchange- 
able graphs is more general and has a wider range of 
application. For example, one can consider the following 
algorithm: (i) throw a bias 9 from a prescribed proba- 
bility distribution 7r„ (ii) set all the elements of A to be 
or 1 according to the toss of a coin with bias 8. The 
resulting probability law, for square matrices, is 



Q\ x i,j ~ 



'■i,j j *i J ■!■)•■ 

[0,1] 



,n} = 
'■'(1-, 



-£, 



>TT n {dd) 
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being any element in {0,1} i,j = 1, 

» 



1. Again 
set n n := Q{x\y = 1} = f x , 0ir n {d6). The resulting 
ensemble of random graph has a large variability in the 
number of links. In the n x n case, the degree distri- 
butions are given by Q{S n i = k} = Q{Z n j = k} = 
(fc) /[o,i] ° k ( l - 9) n - k Kn(d6). Assuming (2) one gets 

Q{S n ,j = k} ~ Q{Z n j = k} ~p a ,j3{k), 

which has, again, a power-law tail. For this model, quan- 
tities like the mean number of subgraphs, roots, leaves 
and hubs, are again easily computed analytically, in the 



same way we described above. Furthermore, throwing 
a triangular matrix with the same algorithm, one can 
easily generate a power-law model for undirected graphs. 
Finally, variants of the model can be generated by chang- 
ing the probability distribution ir n for the biases. Overall, 
all these possibilities remain open to explore and could 
be useful to generate both analytically solvable random 
graph models and quicker algorithms in many applica- 
tions. 



V. EXAMPLE OF COMPARISON WITH 
EMPIRICAL DATA. 

A detailed comparison between known real transcrip- 
tional networks and the null models obtained with our 
coin-toss algorithm is beyond our aims here. Neverthe- 
less, to show that our model can be used for direct statis- 
tical comparisons, as an alternative to the more stringent 
constraint of preserved degree sequences, we present here 
a brief application to the Shen-Orr [27] dataset for the 
E. coli Transcription Network. Motifs discovery, for ex- 
ample, entails comparing the occurrence of subgraphs in 
a real network with a null ensemble. This null ensemble 
can be obtained from our coin-toss model, with some pre- 
scribed parameter set. The parameters can be chosen by 
performing a fit of the model graphs with some observed 
features of the data, such as, for example, decay of the 
degree distributions and number of regulatory elements 
(additional parameters can also be introduced if needed). 
The chosen "invariants" can be motivated biologically. 

We generated our random ensemble with distribution 
ir n given by (3) as follows. First, from a frcquentistic 
estimate of 7r„ we determined the probable value of /3 
and of the cutoff on the maximum, i.e. 7. This last 
quantity has to be regarded as a biological constraint, 
and is necessary to obtain an ensemble having on aver- 
age the same number of links as the empirical one; we 
measure the upper cutoff 1 — 7/71 to be about 18%. The 
estimated value for f3 ranges from 1.6 to 2.1, depending 
on the binning of the histogram of 7r„ (9) . We note that 
these values are larger than those that obtained from fit- 
ting direcly on the outdegree sequence. As a second step, 
we fixed the rectangularity of the matrix with the ratio 
of transcription factors to total number of nodes, choos- 
ing S such that m n /n ~ 0.2766. Finally, we fitted a to 
reproduce on average the observed number of links and 
nodes. In practice, since the model nturally produces 
a certain number of isolated nodes, one has to generate 
slightly larger matrices and compare the submatrix made 
of non-isolated nodes. 

The ensemble obtained with this procedure fits quite 
well the empirical in- and out-degree sequences (Fig. 5A), 
Also, the model reproduces the empirical number of tran- 
scription factors, roots and leaves as average values. As 
a remark, we note that, unless new prescriptions for the 
generation of the graphs, and thus new parameters, are 
introduced, roots, leaves and transcripiton factors can- 



not be reproduced well with smaller values of j3 than the 
ones we used. One can take this as a confirmation that 
the range of values for the exponent obtained with our 
fitting procedure are reasonable. 

We also measured the three-node subgraph content of 
the null model and compared it with the empirical data 
and the model ensemble are very close (Fig. 5B). The 
only exception is the f f 1, with a slight deviation, that, 
however, is much less significant than with the degree- 
conserving ensemble. Thus, in term of these observables, 
one obtains similar graphs as the empirical one. This 
means that in the resulting ensemble the average motif 
content can be regarded as an invariant, rather than as 
an observable. Finally, we quantified the feedback prop- 
erties (Fig. 5C). In order to do this, we measured the 
number Nq of nodes left in the graph after pruning its 
input and output tree-like components with an iterative 
decimation algorithm [35, 36]. In particular, none of the 
graphs we generated was treelike and feedforward as the 
empirical one. One may then speculate that the motif 
content and the hierarchical properties, two important 
properties are somehow related. 



VI. CONCLUSIONS. 

We presented an algorithmic way to generate directed 
graphs with, asymptotically, power-law outdegree and 
compact indegree, easily generalizable to different kinds 
of graphs. The discussion was carried out having in 
mind an application in the realm of transcription net- 
works, although there are many possible connections with 
other experimentally accessible complex networks, in- 
cluding biological ones. Compared to other techniques, 
our model has the advantage to be quick in generating 
large graphs, as it is not designed to preserve a prescribed 
degree sequence, but rather to generate an ensemble with 
given degree distributions. As such, it is an interest- 
ing tool to characterize topological observables in large 
graphs. Most importantly, many of the relevant observ- 
ables are accessible analytically, for any value of n. We 
supplied here as an example the evaluation of the mean 
number of subgraphs, roots and leaves and hub. 

We should add here a comment regarding the is not 
evident that the proposed approach is more efficient that 
the Molloy-Reed algorithm [20] , which generates "stubs" 
with desired in- and outdegree sequences, and matches 
the stubs to generate the graphs. This model could be re- 
cast to be similar in spirit, in the sense that one could fix 
the relevant distributions depending on parameters, ad 
throw the degree sequences from the distributions. Once 
the number of connections for each node has been drawn 
from the expected degree distribution and without avoid- 
ing multiple connections, the computational cost of pair- 
ing the stubs is order E (number of edges), so in sparse 
networks this could be less than order n 2 , and in non- 
sparse networks it could be n 2 . Despite the algorithm 
suffers from the undesired production of multiple edges, 
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FIG. 5: Application of the model to the Shen-Orr dataset. Example of fit and observed features. The plots refer to the 
parameter set j3 = 1.83, a = 0.5, m n /n — 0.2766, with a cutoff on the maximum outdegree at 18% of the nodes as described in 
the text. (A) in- and out-degree histograms of the empirical graph, compared to the random ensemble. While the tail of the 
outdegree may not seem a good fit, we note that the integral of the > 13 tail, or the estimated number of "global regulators", of 
the two laws are remarkably similar (8 in the empirical graph vs 9.7 in the randomized network) so that this has to be regarded 
as a good agreement. (B) Table comparing the subgraph content (for the three-node subgraphs analyzed here) of the model 
graphs with the empirical one. The two quantities are in general very similar, with the exception of the ffl, which deviates 
from average, but only slightly. (C) The feedback in the graph deviates from average more than the triangular subgraphs. Left 
panel: the distribution of f f Is compared with the empirical value. Right panel: the feedback of the random and empirical 
graphs differ. Nc Measures the number of nodes left in a graph after pruning the input and output trees, as described in [35]. 



due to the computational complexity of pairing hubs, for 
a compact indegree distribution, this computational cost 
can be small [28], allowing a practical applicability in 
some regimes. On the other hand, we think that our 
approach remains competitive, as its computational cost 
is not affected by the complexity of the graph ensemble, 
and, as we have shown, is very versatile for analytical 
calculations. 

Regarding the subgraph structure, we note that while 
ffls always dominate on fbls, there are qualitatively 
different behaviors depending on the exponent /3. The 
most marked dominance is found for smaller values of /?, 
and is further increased by considering rectangular ma- 
trices (i.e. asymptotically compact indegree). Thus, the 
degree distribution poses some important constraints on 
the dominant subgraphs in our null model. We would like 
to spend a few more words on these scaling laws with sys- 



tem size n. In our model the scaling of /U n with the decay 
exponent (3 pilots the transitions of all the observables. 
In particular, it renders necessary to consider rectangular 
matrices to obtain an asymptotically compact indegree if 
/3 < 2. 

This behavior is interesting on theoretical grounds, and 
shows how much the distributions for the in- and outde- 
gree in transcription networks are strongly unbalanced. 
For example, in the model described in section IV, where 
the indegree is allowed to have a power-law tail, the sit- 
uation is rather different. In the case of transcription 
networks, there is an observed scaling law of the fraction 
of transcription factors (nodes that have at least one out- 
going link). This is a power-law nt [29] with positive ex- 
ponent 1 < £ < 2. Looking at the distribution of roots, 
one easily realizes that this behavior forbids any asymp- 
totic limit assuming the graph structure of our model, 



and is thus incompatible with it. At the light of our cal- 
culations, we can observe that it is likely that for larger 
values of n the outdegree ceases to follow a power-law, 
and/or the average indegree ceases to be finite, the op- 
posite trend to that observed in small networks. Experi- 
mental observations of larger transcription networks will 
elucidate this question. 



We should stress here that the above considerations ap- 
ply mainly to the model. Nevertheless, we showed that 
in principle our model can be used for direct statisti- 
cal comparisons, as an alternative to the more stringent 
constraint of preserved degree sequences. An example of 
such a fitting procedure, produced an ensemble of net- 
works that resemble the empirical one of E. coli in terms 
of degree distribution, number of links, roots, leaves and 
transcrption factors. Interestingly, the null ensemble pro- 
duced this way also has a very similar three-node sub- 
graph content as the empirical graph. On the other hand, 
the feedback properties are very different. The outcome 
of such a comparison might depend on the invariance 
criteria used for the fitting. This is an interesting fea- 
ture that can be used to produce flexible null models, 
depending on the quantities of interest. On the other 
hand, this feature makes the handling of the model more 
delicate than the standard degree-conserving randomiza- 
tions. In particular, a more exhaustive analysis than that 
presented here is needed to draw clearcut conclusions on 
experimental graphs [37]. Clearly, the degree sequences 
of, for example, the E. coli network, are not stringently 
fixed by any physical of biological constraint. Rather, 
the network, during evolution (and within a population), 
moves in a larger "space of possible interactions" , de- 
termined by selective pressure and other biological con- 
straints, which has not been strictly identified yet. Gen- 
eralizations of our null model might help exploring this 
evolutionary problem. 



Finally, we showed how the coin-toss algorithm, or ex- 
changeable graph model, has a wider range of applica- 
tion than the main example examined here. To illus- 
trate this, we explained how, with the same technique, 
one can obtain directed and undirected power-law ran- 
dom graphs. Obviously, the range of possibilities is even 
larger if one starts to play with the probability distri- 
bution for the biases -K n {d9). For this reason, on more 
abstract grounds, the model can be useful in the context 
of the theory of correlated random networks [22, 30]. It 
is a quick algorithm easy to implement and to analyze 
theoretically. Indeed, because of its simple formulation, 
the potential for further analytical calculations is large. 
For example, one can evaluate the kernel of A, which is 
useful in connection with problems of the Satisfiability 
class, which have seldom been analyzed on non-Poisson 
random graphs [31-34]. 



APPENDIX A: AVERAGE OF FFL 

This appendix reports in more detail the calculation 
of the mean number of ff Is for 1 < j3 < 2. Starting 
from the definition, we obtain with straightforward cal- 
culations 
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Note that, since the finite n formula for the mean is 
known exactly, the finite size scaling can be computed 
analytically, simply by isolating the leading terms in the 
approach to the asymptotic limit. For example, in the 
case of the f f 1 average computed above, one has 
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APPENDIX B: VARIANCE OF FBL VS FFL. 

We report here the calculation of the standard de- 
viation of feedforward and feedback loops, in the case 
of square matrices. The key point is to evaluate 
(A/ffi(G n ) 2 ) and (A/fbi(G„) 2 }. Again, for the sake of sim- 
plicity, we will deal only with square matrices. It is clear 
that <AW(G„) 2 ) - EterEser^lM G G„}, r being 
the set of all feedback loops contained in the complete n 
graph. Analogously one obtains (A/fn(G„) 2 } taking as r 
the set of all feedforward loops. Simple calculations give 
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where Si. n := L 9 t w n (d9). Hence, one obtains 

r n 5 < 2 -« if 1< (3 < 2 

Vor(Mbi(G„)) ~ { (logn) 4 if = 2 
UK^) 3 if /?>2 

As for A/ffi, the computations are longer, but essentially 
the same. The problem is that P{s,t s G„} can take 
many different expression depending on s and t. With 
some simple but tedious calculations one gets 

< A/; fl (G„) 2 > = 6 Q A„ + 6(n - 3) (f) B„ 

+1 <;) (»-> + »(;)(»->. 



with A n = 5l tn 62,n + 5l. n + $l, n fa,n, B n = 5 2 ,nh.n + 
58l, n 5l n +35lJ3,n+Si n +25 h j2,nS3,n+2Sin+^iJln, 
C n = 25i tn 62,n83,n + +8l,nfi4,n + 58l, n fi2,n + 82,n and D n = 

8\Jl. n - Hence, 
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with i?„ = [(f) - ("g 3 )]. For example, if 2 < /3 < 3, the 
last expression gives 



Var(Af tll (G n )) ~ n 2 <" +1 >. 
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